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<°> J ^ APPHfATtOW No. 



jet facility for redundant processor using only fibre 
channel loop 

The present invention relates to a method for 
5 controlling and resetting one processor in a redundant 
pair using only the fibre channel loops. 

Any highly available system with dual (or more) 
redundant processors requires a method of forcing a 
10 faulty processor off the system, to prevent it 
disrupting normal operation. Normally this 
functionality requires dedicated cabling/ thus making 
it difficult to expand the system easily beyond a 
dedicated backplane. 

15 : 

Redundant processors would normally have dedicated 
wiring between them to allow an errant one to be reset 
or powered down. 

20 Dedicated wiring requires extra PWB traces and extra 

cabling between processors. This is both expensive and 
contributes to unreliability. 

This invention overcomes the above problem by using 
25 special purpose hardware, referred to as a HASC (High 
Availability Support Chip) . Using a HASC/ one 
processor can interrogate and control the reset signals 
of another, thus forcing it off the fibre channel loop 
if necessary. 



This invention could be used in a high availability 
version of Intelligent Network Application Protocol 
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INAP, or any other redundant processing system using 
fibre channel as a communications medium. 

This invention could allow high availability server 
5 systems to be offered using existing backplanes and 
cabling systems. 

The invention allows the building of a high 
availability, scaleable file server that does not 
10 require additional inter-processor wiring. 

Furthermore, the invention improves performance by 
accelerating lock management functions. 

15 Other features include: 

High availability: No single point of failure. 
Scalable: More INAP' s can be added allowing. an ' 
approximately linear performance increase . 
20 Fast, redundant , distributed, scalable lock management 
Ability for one INAP to reset another if it detects 
that it is faulty. 

All communications over FC-AL loop, thus scalable 
beyond a -shelf, even into two separate geographical 
25 locations . 

Details: 

. Each INAP has a FC-AL sub-system as shown in Figure 1. 

30 . Note that the FC-AL interface chips are already on the 

existing INAP design. 
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CAM: Content Addressable Memory 

A special type of memory with a built-in search 
capability. You give it the contents. you are looking 
5 for, it returns the address if there is a match. 
Commonly used in routers and switches. 

SERDES: Serialiser /Deserialiser 

10 Converts serial FC data into parallel data at l/10th or 
l/20th of the speed.. 

HASC: High Availability Support Chip 

15 Overview Points:. 

HASC allows the CAM to be read and written by the 
INAP CPU. The CAM is used to store lock data. 
The CAM can be searched by the local CPU or by any 
other device on the FC-AL loop, via the SERDES and 
HASC. 

The HASC has the facility to reset or interrupt 
the INAP module, via a command received over 
either FC-AC loop. 

The HASC is 100% hardware, thus allowing fast CAM 
lookups and predictable reset operation. 

OVERALL SYSTEM OPERATION 

30 A system comprises one or more shelves, with two or 
more INAP-HAs . 
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25 4 
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Each INAP-HA has two expansion ports, thus allowing 
expansion of shelves .containing IAP-HAs only, no I/O 
cards . 

5 At system initialisation time each INAP-HA twins with a 
"buddy", preferably not. in the same shelf (for added 
reliability) . 

. Each INAP-HA regularly talks with its buddy, checking 
10 it's OK, using a high level watchdog type system, over 
the FC-AC loops . 

Whenever an INAP-HA takes out a lock on a file system 
resource its. buddy also put the lock data into its CAM. 
15 Thus the lock data is redundantly stored without every 
INAP-HA having to hold all the lock data. The solution 
is thus scalable. 

When an INAP-HA wishes to check a lock it puts a 
20 broadcast frame on the loop (using either the standard 
if chip or the HASC, TBD) . Each HASC retrieves the 
frame, searches it' s CAM and passes the frame on, 
marking it as a "hit" if the lock exists. The frame 
will arrive back at the originator having been checked 
25 by all HASCs . 

If one INAP-HA detects that it's "buddy" has not sent 
"I'm OK" type messages for an extended time it first 
checks the loop is up, then if more attempts to 
30 communicate fail it sends a "reset" frame to the HASC 
on it's buddy. This feature is called STOMITH (Shoot 
The Other Machine In The Head) . One of the INAP-HAs in 
a set is designated the master, it's watchdog timeout 
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is set to a lesser time than the other, in order to 
prevent both INAP-HAs resetting each other at the same 

t ime ... v . 

5 During normal operation the .INAP-HA' s load share, in an 
active-active manner. 

If an INAP-HA loses its buddy it can buddy up with a 
spare, if available. 

10 

There may be a requirement for more INAP-HA processors 
than the natural limit (approx 8 shelves, .16 ,INAP-~ 
HA's). In this case there are four alternatives. 

• Add extra shelves with no drives. 

15 • Re-package INAP-HA into a format that can be loaded 
from the front, instead of one or more disks, using 
the SCA connectors . 

• Design a custom backplane, capable of taking lots of 
INAP-HAs, in a front loadable format. 

20 Design metalwork capable of holding INAP-HAs. 



